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@ Cache miss prediction method and apparatus. 



(§) In a data processing system which employs a cache memory feature, a method and exemplary special 
purpose apparatus for practicing the method are disclosed to lower the cache miss ratio for called operands. 
Recent cache misses are stored in a first in, first out miss stack, and the stored addresses are searched for 
displacement patterns thereamong. Any detected pattern is then employed to predict a succeeding cache miss 
by prefetching from main memory the signal identified by the predictive address. The apparatus for performing 
this task is preferably hard wired for speed purposes and includes subtraction circuits for evaluating variously 
displaced addresses in the miss stack and comparator circuits for determining if the outputs from at least two 
subtraction circuits are the same indicating a pattern yielding information which can be. combined with ah 
address in the stack to develop a predictive address. 
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CACHE MISS PREDICTION METHOD AND APPARATUS 



R Id of the Invention 



JtS2? relates to tho art of data processing systems which include a cache memory feature and. 

Zff^'r a , e,SVXi , and apParatUS ,0r predic?ln 9 <* che *««e8 for operand calls and 

wvfjp .nformaton to transfer data from a main memory to cache memory to thereby lower the 
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tn t °?* Iqi, l e ^ employing a high speed cache memory intermediate a processor and a main memory 
to hold a dynarmc subset of the information in the main memory in order to speed up system 'SZ! 
Somer^Jf T^lf ChS hoWsa ^^ variabte c^lectionX* ESS* 
fragments selected and updated such that there is a good chance that toe fragments vrill include 
KoSiSfS w ^f*,* ** *^V*W*B operations; ,f there is'a 
££27££ ! S f aVai,abte 10 pr0Ce ^ or much ^ than if main memory had to be 

accessed to obtain the same information. Consequently, in many high 'performance data orocessino 

SSSr^ "» maibr limKah-ons on toe system* execution JSSSS 

Dlaced^«^S V? 3 ' 0W ^ m,S? ^° * ° bVi0US * ° ne ^^«»y meeting the Information to be 
KSf-n^L^ ma,n mern0,y * "** 9iven instant ^ are several techniques for selection 
blocks of instructions for transitory residence in me c^he. and toe more or less linear us^^^^ 
=9 ^sse «echn^ Really effedtlve . However, ti^selectior 'of 
SnSedT^l^ memory at a given Instant has been much lees effective' and has been £££ 
SI 1 T e W more contiguous blocks Including a cache miss address. This approach on* 
shghtly lowers the cache miss ratio and is also an ineffective use of cache capacity 

selecJn?o^t^-^^^r^ W ** H ^ highly durable to provide means for 
lotfS ^ d fo[ transUory storage in a cache memory in such a manner astosignincanUy 

lower toe cache m,ss ratio, and It is to that end that the present invention is directed " Weanuy 
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Summary of the Invention 



Briefly; tbsse and other objects of the invention are achieved by special purpose apparatus in the cache 
ZSLT JP", rCCent ^ e misses *■ ^ patterns therein. Sny deS ittenTis^n 

Z '^Z^""**" ^ * Prefetehl "^ m the block coning 



Description of the Drawing , 

nn rt ? e m?** "TV* th ! lnvention is Particularly p inted out and distinctly claimed in the concluding 
portion of the specification. The invention, however, both as to organfeation and method of operation, may 
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best be understood by reference to the following description taken in conjunction with the subjoined claims 

and the accompany log drawing of which: 

FIG. 1 is a generalized block diagram of a typical data processing system employing a cache 

memory and therefore constituting an exemplary environment for practicing the invention; 
s FIG. 2 is a flow diagram illustrating, in simplified form, the sequence, of operations by which the 

invention isr practiced; 

FK3. 3 is a logic diagram of a simple exemplary embodiment of me invention; and 
FIG. 4 is a logic diagram of a more powerful exemplary embodiment of the invention. 

70 

Detailed Description of the Invention 

Referring now to FIG. 1, there is shown a high level block diagram for a data processing system 
incorporating a cache memory feature. Those skilled in the art will appreciate that this block diagram is only 
75 exemplary and that many variations on it are employed in practice, lis function is merely to provide a 
context for discussing the subject invention. Thus, the illustrative data processing system includes a main 
memory unit 13 which stores the data signal groups (i.e., information words, including instructions and 
operands) required by a central processing unit 14 to execute the desired proc^ures; Signal groups with 
an enhanced, probability for requirement by the; central processing unit 1 4 in the near term are transferred 
2p from the main memory unit 13 (or a user unit 15) through a system interface unit 11 to a ca^che memory 
uhit 12. (Those skilled in the art will understand that, in some data prq^ssihg system architectures, -the 
signal groups are transferred over a system bus, thereby requiring an interface unit for each component 
interacting with the system bus.) The signal groups are stored in the cache memory unit 12 until requested 
by the central processing unit 14. To retrieve the correct signal group, address translation apparatus 16 is 
25 typically incorporated to convert a virtual address (used by the central processing unit 14 to identify the 
signal group to be fetched) to the real address used for that signal group by the remainder of the data 
processing system to identify the signal group. 

The information stored transiently in the cache memory unit 14 may include both instructions arid 
operands stored in separate sections or stored homogeneously. Preferably, In the practice of the present 
30 invention, instructions and operands are stored in separate (at least in the sense that they do not have 
commingled addresses) memory sections in the cache memory unit 14 inasmuch as it is intended to invoke 
the operation of the present invention: as to operand information only. 

The present invention is based on recognizing and taking advantage of sensed patterns in cache 
missies resulting from operand calls, in an extremely elementary example, consider a sensed pattern in 
35 which three cqnsecfutive misses ABC are, in fact, successive operand addresses with D being the next 
successive address. This might take place, merely by way of example, in a data manipulation process 
calling for successively accessing successive rows in a single column of data. If this pattern is sensed, the 
likelihood that signal group D will also be accessed, and soon, is enhanced such that its prefetching into the 
cache memory unit 14 is in order. 
40 The fundamental principles of the invention are set forth jn me operational flow chart of FIG. 2. When a 
processor (or other system unit) asks for an operand, a determination is made as to whether or not the 
operand is currently resident in the cache. If so, there is a cache hit (i.e.; no cache miss), the operand is 
sent to the requesting system unit and the next operand request is awaited. However, if there is an cache 
miss, the request is, in effect, redirected to the (much slower) main memory. 
45 Those skilled in; the art will understand that the description to this point of FIG 2 describes cache 
memory operation generally. In the present invention, however, the address of the cache miss is 
meaningful. It is therefore placed at the top of a miss stack to be described in Jurther detail below. The miss 
stack (which contains a history of the addresses of recent cache misses In conseqjtiye ; order) is then 
examined to determine if a first of several patterns is present This first pattern might be, merely by way of 
50 example, contiguous addresses for the recent cache misses. If the first pattern is not sensed, additional 
patterns are tried. Merely by way of example again, a second pattern might be recent cache misses calling 
for successive addresses situated two locations apart So long as ther is no pattern match, the process 
continues through the pattern repertoire. If there is no match when all patterns in the repertoire have been 
examined, the next cache rnissMs awaited to institut the process anew. 
55 Howev r, if a pattern in the repertoire is detected, a predictive address is calculated from the 
information in the miss, stack and from the sensed pattern. This predictive address is then employed to 
prefetch from main memory into cache the signal group identified by the predictive address. In the 
elementary example previously given, if a pattern is sensed in which consecutive operand cache miss 
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operand addresses ABC are consecutive and contiguous, the value of the predictive address. Q, will be C 
+ 1 . 

In order to optimi2e the statistical integrity of the. miss stack, the predictive address itself may be 
placed, at the top of the stack since it would (highly probably) itself have been the subject of a cache miss if 
5 it had not been prefetched in accordance with the invention. 

Since speed of operation is essential, the invention may advantageously be embodied in a. hard wired 
form (e.g., in a gate array) although firmware control is contemplated. Consider first a relatively simple 
hardwired implementation shown in FIG. 3. A miss stack 20 holds the sixteen hic-st recent cache miss 
addresses, the oldest being identified as address P with entry onto the stack being made at the top. Four 
jo four-input electronic switches 21, 22, 23, 24 are driven in concert by a shift pattern signal via line 25 such 
that in a first state, addresses A, B f G, D appear at to respective outputs ofj ;the switches; in a second state, 
addresses B, D, F, H appear at the outputs; in a third state, addresses C; F, I, L appear at the outputs; and 
in a fourth state, addresses D, H. L, P appear. at the outputs. Subtraction circuits; 26, 27, 28 are connected 
to receive as Inputs the respective outputs of the electronic switches 21; 22. 23, 24 such that: the output 

75 from the subtraction circuit 26 is the output of the switch 21 minus the output of the switch 22; the output 
from the subtraction circuit 27 is the output of the switch 22 minus the output of the switch 23; and the 
output from the subtraction circuit 28 is the output of the switch 23 minus uie output of the switch 24. 

The output from the subtraction circuit 26 is applied; to/bhe input of- an adder circuit 31 which has its 
other input driven by the -output of the- electronic switch 21. In addition, the. output from the subtraction 

20 circuit 26 is also applied to one input of ^comparator circuit 29. The output from the subtraction circuit 27 
is applied to the other input of the comparator circuit 29 and also to one input of another comparator circuit 
30 which has its other, input driven by the output of the subtraction circuit 28. The outputs from the 
comparator circuits 29. 30 are applied, respectively, to the two inputs of an AND-gate 32 which selectively 
issues a prefetch enable signal. 

25 Consider now the operation of the circuit shown in FIG. 3^ As previously noted, miss stack 20-holds the 
last sixteen cache miss addresses, address A being the most recent When the request for the signa! group 
identified by address A results in a cachemiss, circuit operation is instituted to search for a pattern among 
the addresses resident in the miss stack. The electronic switches 21 , 22. 23, 24 are at their first state such 
that address A is passed through to the output of switch 21, address B appears at the output of switch 22, 

30 address C appears at thtf output of switch 23 and address D appears at] the output of switch 24. If the 
differences between A ; and B, B and C, and C and D are not all equal, not all the outputs from the 
subtraction circuits 26, 27, 28 will be equal such that one or both the comparator circuits 29. 30 will issue' a 
no compare; and AND-gate 32 will not be enabled, thus indicating a "no pattern match found" condition. 
The switches are then advanced to their second state in which addresses B, D, F, H appear at their 

35 respective outputs: Assume now that (B> O) s (D * F) = (F- H); i.e., a sequential pattern has been sensed 
in the address displacements. Consequently, both "the comparators 29, 30 will issue compare signals to fully 
enable the AND-gate 32 and produce a prefetch enable signal. Simultaneously, the output from the adder 
circuit 31 will be the predictive address (B + (B -D)). It will be seen that this predictive address extends the 
sensed pattern and thus increases the probability that the prefetched signahgroup will be requested by the 

40 processor, thereby lowering the cache miss ratio. 

If a pattern had not have been sensed in the address combination BDFH. the electronic switches would 
have been advanced t6 their next state to examine the address combination GFIL and then on to the 
address combination DHLP if necessary. If no pattern was sensed, the circuit would await the next cache 
miss which will place a new entry at the top of the miss stack and push address P out the bottom of the 

45 stack before the pattern match search is again instituted. 

Consider now the somewhat more complex and powerful embodiment of the invention illustrated in FIGi 
4. Electronic switches 41, 42, ^3, 44 receive at their respective inputs, recent cache miss addresses as 
stored in the miss stack 40 in the exemplary arrangement shown, it will be noted that each of the electronic 
switches 41. 42, 43, 44 has eight inputs, which can be sequentially selectively transferred to the single 

so outputs under the influence of the shift pattern signal. It will also be noted that the miss stack 40 stores, in 
addition to the sixteen latest cache miss addresses A - P, three future entries WXY. Subtraction circuits 45. 
46. 47 perform the same office as the corresponding subtraction circuits 26. 27, 28 of the FIG. 3 
embodiment previously described. Similarly, adder circuit 48 corresponds to the adder circuit 31 previously 
described. 

55 Comparator circuit 49 r ceives the respective outputs of the subtraction circuits 45. 46; and its.output is 
applied to one input of an AND-gate 38 which selectively issues the prefetch enable signal. Comparator 
circuit 50 receives the respective outputs of the subtraction circuits 46. 47, but. unlike its counterpart 
comparator 30 of the FIG. 3 embodiment, its oulput is applied to one input of an OR-gate,39 which has its 
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other Input' driven by a redudB Icwkahead signal. The output of OR-gafo 39 is eoupled to the ther Input of 
AND-gate 38. With this arrangement, activation of the reduce Ibokahead signal enables OR-Gate 39 and- 
partially enables AND-gate 38. The effect of applying the reduce lobkahead signal Is to compare only the 
outputs of the subtraction drbuits 45, 46 jh the comparator circuit 49 such that a compare fully enables the 
AND-gatet38; to issue.lhe prefetch enable signal., This mode of operation may be useful, for example, when 
the patterns seem to be changing every few cache misses, and It favors the most recent examples. 

With the arrangement of RG. 4, It is . ^yantageous to try all the patterns within pattern groups (as 
represented by the v TES n response to the >1 PATTERN GRpUP? w qgery In the flow diagram of FIG. 2) 
even if there is a pattern match detected intermediate the process.. ; This follows from ftefert that more than 
orie nt the future entries WXYto the miss stackmay be developed during a single p^ thrt^h the pattern 
repertoire or even a subset of the pattern repertoire. With the specific implementation of RG. 4 (which is 
only exemplary of many possible useful configurations), the following resultsare obtainable: 
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The goal states are searched in groups by switch: state; i^.: Group 1 includes switch states 0, 1, 2 and 
could result In Ailing future entries W>CT; Group 2 includes states 3, 4 and could result In filling entries WX; 
Group 3 includes states 5, 6 and could also result in filling entries WX; and Group 4 Includes state Zand 
could result- in filling entry W, Vi/hen a goal statels reached thfirt has been predicted, thesearch is halted for 
the current cache miss; I.e., it would not be desirable to replape an alre^y develop predictive address W 
with a different predictive address W. 

TTiose skilled in the art will understand that- the- logic tinbuttry of RGs. 3 arid 4 Is somewhat Amplified 
since multiple binary digit ihfomia^^^^ ais If it were single binary digit Information. Thus, In 

practice, arrays of ele<^r6rtic. switches; gates* etc. will actually be employed to handle the ^ecl 'dimension 
as may be necessary and entirely- conventionally. Further, timing signals and logic for Incorporating the, 
Inventive structure Into a given date processing system environment will be thpsa appropriate for that 
environment arid will be the subject of straightforward logic, design. 

Thus, while the principles of the invention have now been made clear in an Illustrative embodiment, 
there will be Immediately obvious to those skilled In the art many m^ificstions of strectore, arrangments. 
proportions, the elements, materials, and components, used In the practice of the Invention which are 
particularly adapted for specific environments and operating requirements without departing from those 
principles; 



Claims 

1. In a data processing system incorporating a cache memory, a method for predicting cache miss 
addresses comprising the steps of: 

A) establishing a miss stack for storing a plurality of cache miss addresses; 

B) waiting for a cache miss; 

,G) whbn a cache miss occurs, placing the address of the called information onto the top of the miss 

stack; 

D) examining the miss stack for an address pattern am ng the resident cache miss addresses; 

E) if a pattern is not sensed, returning to step B); and 

F) If a pattern is sensed: 

1) using the sensed pattern arid at least on of the addresses in. th miss stack to calculate a 
predictive address: 
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2) prefetching into cache memory the signal group identified by the predictive address; any 

3) returning to step B). 

2. The; method of Claim 1 in which, during step D),-a, repertoire of predetermined addfessi patterns are 
searchable, and the examination continues, from pattern'tq pattern until a first pattern match Is sensed. 
'* VJ 3 - ,n a data Pressing system incorporating a cache, m mbry. a method for predicting caches miss 
addresses comprising the steps of: . * 

A) establishing a miss stack for storing a plurality of cache miss addresses: 

B) waiting for a cache miss: 

p) when a cache miss occurs, placing, the address of the called information onto thetop of the miss 

10 S13CK,' ' " " " ' 

D) examining the cache miss addresses resident in the miss stack for a match with a selected 
address pattern in a current group of a plurality of groups of patterns; 

if the selected pattern is not sensed, determining if all the patterns in the cuircnt group have been 

; examined; 

FJ rf all the patterns in the qunrent group have not been examined, Selecting another pattern in the 
current group and returning to step D); 

G) if all the patterns in ail the grbups in the pattern repertdre: have b^htsearched; returning to step 

8); 

H) if all the patterns in the current group have been examined, assigning another group as the current 
group/selecting a pattern from the new current group and returning to step D); and 

I) if the selected pattern is sensed: 

. 1 ) Msi^g the J sensed pattern; and at least one of the addresses in the miss stack to calculate a 
predictive address; - p 

^ prefetching mto: c<^ memory the signal group identified by the Spretiictive address; and 
25 3) assigning another grwp ^ ^d returning io ^ 

4. The method of Claim 3 in which; intermediate substeps 1)1) anrj 1)3), there is performed substep 1)2)- 
a) in which the predictive address is placed onto the miss stack. 

5. Apparatus for developing a predictive; address for prefetching information into a cache memory 
compnsing: 

A) a firs t in , first out stack for storing a plurality of addresses representing cache misses; 

B) a plurality of electronic switch means each having a plurality - of address inputs and a single 
address output; " ' y * 

Q) means coupling said addresses stored in said stack imJividually t^ means 
inputs in predetermined orders; 

p) means for switching said electronic switch means to transfer said addresses applied to said 
electronic switch means inputs to said electronic switch outputs to establish at said electronic switch 
outputs predetemiined combinations of said addresses; 

E) at le^ two subtraction circuit means each coupled to receive r pair of different addresses from 
said electronic switch means outputs and to issue a value representing the displacement therebetween; 

F) $ least one comparator circuit means coupled to receive a pair of Outputs from a correspon ding 
pair of said subtraction circuit means and responsive thereto for issuing aniprefetch enable logic signal if 
there is a compare condition; and ' 

G) predictive address development means adapted to combine one of said addresses appearing at 
one of said electronic switch outputs and displacement information appearing at, one of said subtraction 

45 circuit means to obtain a predictive address; 

whereby, the. coordinated presence of said predictive address and said prefetch enable logic signal causes 
a signal group identified by said predictive address to be; prefetched into said/cache memory 

6 The apparatus of Claim 5 which includes at least three^of said subtraction drcuit rrieanscand af least" 
two of said comparator circuit means and which further comprises: 

r>o A) AND^ate means having separate inputs respectively receiving outputs coupled from said at least two 
comparator circuit means, said AND-gate selectively issuing -said prefetch enable logic signal only when 
fully enabled, ' ' 

7. The apparatus of Claim 6 which further includes: 

A) OR-gate : means driving at least one input to said AND-gate means, said OR-gate means* having inputs 
55 receiving: 

1 . outputs coupled from at least one of said comparator circuits; and 

2. a selectively applied reduce lookahead logic signal; 

whereby, application of said reduce lookahead: signal to said OR-gate means partially enables said AN D- 
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gate means driven thereby and thus eliminates said at least one of said comparator circuits from 
consideration in the issuance of said prefetch enable logic signal. 
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